Effective search space reduction for spell correction using character neural embeddings
نویسنده
چکیده
We present a novel, unsupervised, and distance measure agnostic method for search space reduction in spell correction using neural character embeddings. The embeddings are learned by skip-gram word2vec training on sequences generated from dictionary words in a phonetic informationretentive manner. We report a very high performance in terms of both success rates and reduction of search space on the Birkbeck spelling error corpus. To the best of our knowledge, this is the first application of word2vec to spell correction.
منابع مشابه
Unsupervised Context-Sensitive Spelling Correction of English and Dutch Clinical Free-Text with Word and Character N-Gram Embeddings
We present an unsupervised context-sensitive spelling correction method for clinical free-text that uses word and character n-gram embeddings. Our method generates misspelling replacement candidates and ranks them according to their semantic fit, by calculating a weighted cosine similarity between the vectorized representation of a candidate and the misspelling context. To tune the parameters o...
متن کاملUnsupervised Context-Sensitive Spelling Correction of Clinical Free-Text with Word and Character N-Gram Embeddings
We present an unsupervised contextsensitive spelling correction method for clinical free-text that uses word and character n-gram embeddings. Our method generates misspelling replacement candidates and ranks them according to their semantic fit, by calculating a weighted cosine similarity between the vectorized representation of a candidate and the misspelling context. We greatly outperform two...
متن کاملDisambiguation and spelling correction for a neural network based character recognition system
Various approaches have been proposed over the years for using contextual and linguistic information to improve the recognition rates of existing OCR systems. However, there is an intermediate level of information that is currently underutilized for this task: confidence measures derived from the recognition system. This paper describes a high-performance recognition system that utilizes identi...
متن کاملDetermining Effective Features for Face Detection Using a Hybrid Feature Approach
Detecting faces in cluttered backgrounds and real world has remained as an unsolved problem yet. In this paper, by using composition of some kind of independent features and one of the most common appearance based approaches, and multilayered perceptron (MLP) neural networks, not only some questions have been answered, but also the designed system achieved better performance rather than the pre...
متن کاملChinese Spell Checking Based on Noisy Channel Model
Chinese spell checking is an important component of many NLP applications, including word processors, search engines, and automatic essay rating. Compared to English, Chinese has no word boundaries and there are various Chinese input methods that cause different kinds of typos, so it is more difficult to develop spell checkers for Chinese. In this paper, we introduce a novel method for correcti...
متن کامل